2  Explaining predictions

In this chapter, we will

navigate the accuracy-explainability for public policy Bell et al. (2022)

what is explainable differs between stakeholders Amarasinghe et al. (2023)

biodiversity need sustained model uptake Weiskopf et al. (2022)

2.0.1 Partial responses

values of variable against mean of all others

2.0.2 Inflated partial responses

sample background variables

still a measure of global model response because the values are kept but the structure is lost

gives a better sense of potentially divergent responses

2.0.3 Shapley values

LOCAL (prediction-scale) importance

Štrumbelj & Kononenko (2013) monte carlo approximation of shapley values

Wadoux et al. (2023) mapping of shapley values

Mesgaran et al. (2014) mapping of most important covariates

Lundberg & Lee (2017) SHAP

important properties + interpretation

2.0.4 Importance of transfo as part of model

transfo in model = we can still apply these techniques instead of asking “what does PC1 = 0.4 mean”

2.1 Application

2.1.1 Partial responses

Figure 2.1: TODO
Figure 2.2: TODO
Figure 2.3: TODO

2.1.2 Shapley values

S = zeros(Float64, (length(variables(model)), length(labels(model))))
for (vidx, vpos) in enumerate(variables(model))
    S[vidx,:] = explain(model, vpos; threshold=false, samples=200)
end
P = features(model, variables(model))
4×2230 Matrix{Float64}:
  6.15002   8.75   8.45001  13.75  …  13.85  14.65   9.25  14.05  13.85
 26.3      22.1   25.2      19.6      21.7   18.6   26.4   18.8   19.3
 42.2      40.7   44.3      45.0      49.6   44.9   47.1   43.6   49.6
 23.15     22.95  24.85     26.45     28.15  26.85  26.35  26.35  26.35

TODO redraw the stemplot from the variable selection chapter to compare prediction v. explanation

Table 2.1: blah blah blah
Variable Imp. (Shapley) Imp. (bootstrap) Min. Med. Max.
BIO 8 39.59% 35.25% -0.29 -0.09 0.56
BIO 7 33.74% 25.45% -0.40 -0.05 0.30
BIO 15 14.91% 8.88% -0.38 0.00 0.18
BIO 5 11.76% 6.61% -0.24 -0.01 0.20
Figure 2.4: Effect of each variable (sorted by importance as in Table 2.1) on the change of the score for a single prediction. Recall that this is expressed as the change from the average prediction made by the model.
Figure 2.5: Effect of each variable (sorted by importance as in Table 2.1) on the change of the score for a single prediction. Recall that this is expressed as the change from the average prediction made by the model.
f = Figure(; size=(600, 400))
args = (color=predict(model), markersize=5, colorrange=(0., 1.))

ax1 = Axis(f[1,1]; xlabel="BIO $(model.v[varord[1]])")
scatter!(ax1, P[varord[1],:], S[varord[1],:]; args...)
ax2 = Axis(f[1,2]; xlabel="BIO $(model.v[varord[2]])")
scatter!(ax2, P[varord[2],:], S[varord[2],:]; args...)
ax3 = Axis(f[2,1]; xlabel="BIO $(model.v[varord[3]])")
scatter!(ax3, P[varord[3],:], S[varord[3],:]; args...)
ax4 = Axis(f[2,2]; xlabel="BIO $(model.v[varord[4]])")
scatter!(ax4, P[varord[4],:], S[varord[4],:]; args...)

xmin, xmax = extrema(S)
for ax in [ax1, ax2, ax3, ax4]
    hlines!(ax, [0.0], color=:black, linestyle=:dash)
end

current_figure()

2.1.3 Spatial partial effects

_layer_path = joinpath(dirname(Base.active_project()), "data", "occurrences", "layers.tiff")
bio = [SimpleSDMLayers._read_geotiff(_layer_path; bandnumber=i) for i in 1:19]
19-element Vector{SDMLayer{Float32}}:
 SDMLayer{Float32}(Float32[Inf Inf … Inf Inf; Inf Inf … Inf Inf; … ; Inf Inf … Inf Inf; Inf Inf … Inf Inf], Bool[0 0 … 0 0; 0 0 … 0 0; … ; 0 0 … 0 0; 0 0 … 0 0], (8.533193690349995, 9.549860352949993), (41.33319391914997, 43.00819391245001), "+proj=longlat +datum=WGS84 +no_defs")
 SDMLayer{Float32}(Float32[Inf Inf … Inf Inf; Inf Inf … Inf Inf; … ; Inf Inf … Inf Inf; Inf Inf … Inf Inf], Bool[0 0 … 0 0; 0 0 … 0 0; … ; 0 0 … 0 0; 0 0 … 0 0], (8.533193690349995, 9.549860352949993), (41.33319391914997, 43.00819391245001), "+proj=longlat +datum=WGS84 +no_defs")
 SDMLayer{Float32}(Float32[Inf Inf … Inf Inf; Inf Inf … Inf Inf; … ; Inf Inf … Inf Inf; Inf Inf … Inf Inf], Bool[0 0 … 0 0; 0 0 … 0 0; … ; 0 0 … 0 0; 0 0 … 0 0], (8.533193690349995, 9.549860352949993), (41.33319391914997, 43.00819391245001), "+proj=longlat +datum=WGS84 +no_defs")
 SDMLayer{Float32}(Float32[Inf Inf … Inf Inf; Inf Inf … Inf Inf; … ; Inf Inf … Inf Inf; Inf Inf … Inf Inf], Bool[0 0 … 0 0; 0 0 … 0 0; … ; 0 0 … 0 0; 0 0 … 0 0], (8.533193690349995, 9.549860352949993), (41.33319391914997, 43.00819391245001), "+proj=longlat +datum=WGS84 +no_defs")
 SDMLayer{Float32}(Float32[Inf Inf … Inf Inf; Inf Inf … Inf Inf; … ; Inf Inf … Inf Inf; Inf Inf … Inf Inf], Bool[0 0 … 0 0; 0 0 … 0 0; … ; 0 0 … 0 0; 0 0 … 0 0], (8.533193690349995, 9.549860352949993), (41.33319391914997, 43.00819391245001), "+proj=longlat +datum=WGS84 +no_defs")
 SDMLayer{Float32}(Float32[Inf Inf … Inf Inf; Inf Inf … Inf Inf; … ; Inf Inf … Inf Inf; Inf Inf … Inf Inf], Bool[0 0 … 0 0; 0 0 … 0 0; … ; 0 0 … 0 0; 0 0 … 0 0], (8.533193690349995, 9.549860352949993), (41.33319391914997, 43.00819391245001), "+proj=longlat +datum=WGS84 +no_defs")
 SDMLayer{Float32}(Float32[Inf Inf … Inf Inf; Inf Inf … Inf Inf; … ; Inf Inf … Inf Inf; Inf Inf … Inf Inf], Bool[0 0 … 0 0; 0 0 … 0 0; … ; 0 0 … 0 0; 0 0 … 0 0], (8.533193690349995, 9.549860352949993), (41.33319391914997, 43.00819391245001), "+proj=longlat +datum=WGS84 +no_defs")
 SDMLayer{Float32}(Float32[Inf Inf … Inf Inf; Inf Inf … Inf Inf; … ; Inf Inf … Inf Inf; Inf Inf … Inf Inf], Bool[0 0 … 0 0; 0 0 … 0 0; … ; 0 0 … 0 0; 0 0 … 0 0], (8.533193690349995, 9.549860352949993), (41.33319391914997, 43.00819391245001), "+proj=longlat +datum=WGS84 +no_defs")
 SDMLayer{Float32}(Float32[Inf Inf … Inf Inf; Inf Inf … Inf Inf; … ; Inf Inf … Inf Inf; Inf Inf … Inf Inf], Bool[0 0 … 0 0; 0 0 … 0 0; … ; 0 0 … 0 0; 0 0 … 0 0], (8.533193690349995, 9.549860352949993), (41.33319391914997, 43.00819391245001), "+proj=longlat +datum=WGS84 +no_defs")
 SDMLayer{Float32}(Float32[Inf Inf … Inf Inf; Inf Inf … Inf Inf; … ; Inf Inf … Inf Inf; Inf Inf … Inf Inf], Bool[0 0 … 0 0; 0 0 … 0 0; … ; 0 0 … 0 0; 0 0 … 0 0], (8.533193690349995, 9.549860352949993), (41.33319391914997, 43.00819391245001), "+proj=longlat +datum=WGS84 +no_defs")
 SDMLayer{Float32}(Float32[Inf Inf … Inf Inf; Inf Inf … Inf Inf; … ; Inf Inf … Inf Inf; Inf Inf … Inf Inf], Bool[0 0 … 0 0; 0 0 … 0 0; … ; 0 0 … 0 0; 0 0 … 0 0], (8.533193690349995, 9.549860352949993), (41.33319391914997, 43.00819391245001), "+proj=longlat +datum=WGS84 +no_defs")
 SDMLayer{Float32}(Float32[Inf Inf … Inf Inf; Inf Inf … Inf Inf; … ; Inf Inf … Inf Inf; Inf Inf … Inf Inf], Bool[0 0 … 0 0; 0 0 … 0 0; … ; 0 0 … 0 0; 0 0 … 0 0], (8.533193690349995, 9.549860352949993), (41.33319391914997, 43.00819391245001), "+proj=longlat +datum=WGS84 +no_defs")
 SDMLayer{Float32}(Float32[Inf Inf … Inf Inf; Inf Inf … Inf Inf; … ; Inf Inf … Inf Inf; Inf Inf … Inf Inf], Bool[0 0 … 0 0; 0 0 … 0 0; … ; 0 0 … 0 0; 0 0 … 0 0], (8.533193690349995, 9.549860352949993), (41.33319391914997, 43.00819391245001), "+proj=longlat +datum=WGS84 +no_defs")
 SDMLayer{Float32}(Float32[Inf Inf … Inf Inf; Inf Inf … Inf Inf; … ; Inf Inf … Inf Inf; Inf Inf … Inf Inf], Bool[0 0 … 0 0; 0 0 … 0 0; … ; 0 0 … 0 0; 0 0 … 0 0], (8.533193690349995, 9.549860352949993), (41.33319391914997, 43.00819391245001), "+proj=longlat +datum=WGS84 +no_defs")
 SDMLayer{Float32}(Float32[Inf Inf … Inf Inf; Inf Inf … Inf Inf; … ; Inf Inf … Inf Inf; Inf Inf … Inf Inf], Bool[0 0 … 0 0; 0 0 … 0 0; … ; 0 0 … 0 0; 0 0 … 0 0], (8.533193690349995, 9.549860352949993), (41.33319391914997, 43.00819391245001), "+proj=longlat +datum=WGS84 +no_defs")
 SDMLayer{Float32}(Float32[Inf Inf … Inf Inf; Inf Inf … Inf Inf; … ; Inf Inf … Inf Inf; Inf Inf … Inf Inf], Bool[0 0 … 0 0; 0 0 … 0 0; … ; 0 0 … 0 0; 0 0 … 0 0], (8.533193690349995, 9.549860352949993), (41.33319391914997, 43.00819391245001), "+proj=longlat +datum=WGS84 +no_defs")
 SDMLayer{Float32}(Float32[Inf Inf … Inf Inf; Inf Inf … Inf Inf; … ; Inf Inf … Inf Inf; Inf Inf … Inf Inf], Bool[0 0 … 0 0; 0 0 … 0 0; … ; 0 0 … 0 0; 0 0 … 0 0], (8.533193690349995, 9.549860352949993), (41.33319391914997, 43.00819391245001), "+proj=longlat +datum=WGS84 +no_defs")
 SDMLayer{Float32}(Float32[Inf Inf … Inf Inf; Inf Inf … Inf Inf; … ; Inf Inf … Inf Inf; Inf Inf … Inf Inf], Bool[0 0 … 0 0; 0 0 … 0 0; … ; 0 0 … 0 0; 0 0 … 0 0], (8.533193690349995, 9.549860352949993), (41.33319391914997, 43.00819391245001), "+proj=longlat +datum=WGS84 +no_defs")
 SDMLayer{Float32}(Float32[Inf Inf … Inf Inf; Inf Inf … Inf Inf; … ; Inf Inf … Inf Inf; Inf Inf … Inf Inf], Bool[0 0 … 0 0; 0 0 … 0 0; … ; 0 0 … 0 0; 0 0 … 0 0], (8.533193690349995, 9.549860352949993), (41.33319391914997, 43.00819391245001), "+proj=longlat +datum=WGS84 +no_defs")
V = explain(model, bio; threshold=false, samples=30)
4-element Vector{SDMLayer{Float64}}:
 SDMLayer{Float64}([0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; … ; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0], Bool[0 0 … 0 0; 0 0 … 0 0; … ; 0 0 … 0 0; 0 0 … 0 0], (8.533193690349995, 9.549860352949993), (41.33319391914997, 43.00819391245001), "+proj=longlat +datum=WGS84 +no_defs")
 SDMLayer{Float64}([0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; … ; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0], Bool[0 0 … 0 0; 0 0 … 0 0; … ; 0 0 … 0 0; 0 0 … 0 0], (8.533193690349995, 9.549860352949993), (41.33319391914997, 43.00819391245001), "+proj=longlat +datum=WGS84 +no_defs")
 SDMLayer{Float64}([0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; … ; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0], Bool[0 0 … 0 0; 0 0 … 0 0; … ; 0 0 … 0 0; 0 0 … 0 0], (8.533193690349995, 9.549860352949993), (41.33319391914997, 43.00819391245001), "+proj=longlat +datum=WGS84 +no_defs")
 SDMLayer{Float64}([0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; … ; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0], Bool[0 0 … 0 0; 0 0 … 0 0; … ; 0 0 … 0 0; 0 0 … 0 0], (8.533193690349995, 9.549860352949993), (41.33319391914997, 43.00819391245001), "+proj=longlat +datum=WGS84 +no_defs")
f = Figure()
a1 = Axis(f[1,1])
a2 = Axis(f[1,2])
heatmap!(a1, V[varord[1]], colormap=bkcol.div, colorrange=(-0.5,0.5))
heatmap!(a2, partialresponse(model, bio, variables(model)[varord[1]]; threshold=false))
current_figure()

2.1.4 Most important variable locally

Figure 2.6: TODO

2.2 Conclusion

References

Amarasinghe, K., Rodolfa, K.T., Lamba, H. & Ghani, R. (2023). Explainable machine learning for public policy: Use cases, gaps, and research directions. Data & Policy, 5.
Bell, A., Solano-Kamaiko, I., Nov, O. & Stoyanovich, J. (2022). Its just not that simple: An empirical study of the accuracy-explainability trade-off in machine learning for public policy. 2022 ACM Conference on Fairness, Accountability, and Transparency.
Lundberg, S.M. & Lee, S.-I. (2017). A unified approach to interpreting model predictions. In: Advances in neural information processing systems (eds. Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., et al.). Curran Associates, Inc.
Mesgaran, M.B., Cousens, R.D. & Webber, B.L. (2014). Here be dragons: a tool for quantifying novelty due to covariate range and correlation change when projecting species distribution models. Diversity and Distributions, 20, 1147–1159.
Štrumbelj, E. & Kononenko, I. (2013). Explaining prediction models and individual predictions with feature contributions. Knowledge and Information Systems, 41, 647–665.
Wadoux, A.M.J.-C., Saby, N.P.A. & Martin, M.P. (2023). Shapley values reveal the drivers of soil organic carbon stock prediction. SOIL, 9, 21–38.
Weiskopf, S.R., Harmáčková, Z.V., Johnson, C.G., Londoño-Murcia, M.C., Miller, B.W., Myers, B.J.E., et al. (2022). Increasing the uptake of ecological model results in policy decisions to improve biodiversity outcomes. Environmental Modelling & Software, 149, 105318.